Operating method for an automated language recognizer intended for the speaker-independent language recognition of words in different languages and automated language recognizer

ABSTRACT

The invention relates to an operating method for an automated language recognizer intended for the speaker-independent language recognition of words from different languages, particularly for recognizing names from different languages. The method is based on a language defined as the mother tongue and has an input phase for establishing a language recognizer vocabulary. Phonetic transcripts are determined for words in various languages in order to obtain phoneme sequences for pronunciation variants. The phonemes of each relevant phoneme set of the mother tongue are then specifically mapped to determine phoneme sequences that correspond to pronunciation variants.

BACKGROUND

The method relates to an operating method of an automatic languagerecognizer for speaker-independent language recognition of words ofdifferent languages and a corresponding automatic language recognizer.

For phoneme-based language recognition, a language-recognitionvocabulary is required, containing phonetic descriptions of all thewords to be recognized. Typically, words are represented by sequences orchains of phonemes in the vocabulary. During a language recognitionprocess, a search is conducted for the best path through various phonemesequences found in the vocabulary. This search can, for example, takeplace by means of the Viterbi algorithms. For continuous languagerecognition, the probabilities for transitions between words can also bemodeled and included in the Viterbi algorithm.

A phonetic transcription for the words to be recognized form the basisof phoneme-based language recognition. Therefore, at the start of aphoneme-based language recognition process, the first order is to obtainphonetic transcripts for the word. Phonetic transcripts can be generallydefined as the phonetic descriptions of words from a target vocabulary.Obtaining phonetic transcripts particularly relevant for words that arenot known to the language recognizer.

Mobile or cordless telephones are known that enable speaker-dependentname selection. In this case, a user of such a telephone must train theentries contained in the electronic telephone book of the telephone inorder to be able to subsequently use the name selection by spoken word.Normally, no other user can use this feature because thespeaker-dependent name selection is suitable for only one person, i.e.for the person who has trained the language selection. To overcome thisproblem, the entries in the electronic telephone book can be changed tophonetic transcripts.

To determine the phonetic transcript from a written word, for examplefrom a telephone book entry, various approaches are known in the art.One example is a dictating system that is used with a PC. With dictatingsystems of this kind, a lexicon of typically more than 10,000 words withan allocation of letter sequences to the phoneme sequences is normallystored. Because a lexicon of this kind requires a very high storagecapacity, it is not practical for mobile terminal devices such as mobileor cordless telephones to wholly incorporate this configuration.

Systems are also known whereby the conversion of a word to its phonetictranscript is rule-based, or takes place using specially trained neuralnetworks. As with the lexicon, this method also has one disadvantagethat the language in which the phoneme sequences to be realized must bespecified. In any case, names from different languages may be present,particularly in electronic telephone books. On a mobile device,converting words from different languages would be burdensome to whollyimplement under the above configuration.

Other multilingual systems for determining phoneme sequences andlanguage recognition have been developed. These systems enable phonemesequences to be created from different languages.

Under still other configurations, a user speaks the words into alanguage recognition system that automatically generates sequences ofphonemes. However, for large vocabularies, (e.g., an electronictelephone book with 80 entries), this is no longer acceptable for theuser.

SUMMARY OF THE INVENTION

The present disclosure provides an operating system and method for anautomatic language recognizer for speaker-independent languagerecognition of words from various languages and also a correspondingautomatic language recognizer that is simple to implement, isparticularly suitable for use in mobile terminal devices and can berealized at reasonable cost.

As an example, a method for voice recognition is provided including thesteps of:

(a) determining the phonetic transcripts of words for N variouslanguages, in order to obtain N first phoneme sequences per wordcorresponding to N first pronunciation variants;

(b) implementing a mapping of the phonemes of each language to therelevant phoneme set of the mother tongue;

(c) using the mapping implemented in step (b) to the N first phonemesequences for each word determined in step (a), whereby for each word Nsecond phoneme sequences corresponding to N second pronunciationvariants are obtained that can be recognized by means of a mother tonguelanguage recognizer; and

(d) creation of a language recognition vocabulary with the N secondphoneme sequences per word, obtained in the preceding step, for themother tongue language recognizer.

As another example, a system for voice recognition is providedincluding: a mother tongue language recognizer; a first processingmodule for determining the phonetic transcripts of words for N variouslanguages in each case, in order to obtain N first phoneme sequences foreach word corresponding to N first pronunciation variants; a secondprocessing module for implementing a mapping of the phonemes of eachlanguage to the particular phoneme set of the mother tongue; a thirdprocessing module for applying the mapping, implemented by means of thesecond processing module, to the N first phoneme sequences for each worddetermined by means of the first processing module, with N secondphoneme sequences corresponding to N second pronunciation variants beingobtained per word, that can be recognized by means of the mother tonguelanguage recognizer; and a fourth processing module for creating alanguage recognizable vocabulary with the N second phoneme sequences perword, obtained by the third processing module, for the mother tonguelanguage recognizer.

BRIEF DESCRIPTION OF THE DRAWINGS

The invention and its wide variety of potential embodiments will be morereadily understood through the following detailed description, withreference to the accompanying drawing in which:

FIG. 1 is a schematic flow diagram of the input phase for creation of alanguage recognition vocabulary in accordance with an exemplaryembodiment of the invention.

DETAILED DESCRIPTION

Under an exemplary embodiment, phonetic transcripts of words for Nvarious languages is determined and then reprocessed and applied to aphoneme-based monolingual language recognizer. This procedure worksunder the assumption that a user of the voice recognizer normally speaksin his/her mother tongue. The user may also pronounce foreign-languagewords, such as names, with a mother-tongue nuance, (i.e. an accent),that can be roughly modeled by a mother-tongue language recognizer. Theoperating method is therefore based on a language defined as the mothertongue.

Each language can thus be described with different phonemes suitable forthe particular language. It is known, however, that many phonemes indifferent languages resemble one another. An example of this is the “p”in English and German.

This fact is utilized in multilingual language recognition. In this casea single Hidden Markov model is created for the collection of languages,by means of which several languages can be recognized simultaneously.However, this leads to a very large Hidden Markov model with a lowerrecognition rate than a monolingual Hidden Markov model. Furthermore, ifthe collection of languages is extended, for example by a secondarylanguage, a new Hidden Markov model has to be created, which is veryexpensive.

According to an exemplary embodiment, in a first step of the input phasefor creation of a language recognition vocabulary of an operatingprocedure of an automated language recognizer for speaker-independentlanguage recognition of words from various languages, particularly forthe recognition of names from various languages, the phonetictranscripts of words for N various languages are determined in eachcase, in order to obtain N first phoneme sequences per wordcorresponding to N first pronunciation variants. In a second step, thesimilarities between the languages are utilized. To do this, a depictionof the phonemes of each language is implemented on the particularphoneme set of the mother tongue. Furthermore, in a third step theimplemented depiction on the N first phoneme sequences determined in thefirst step is used for each word. In this way, N second phonemesequences corresponding to N second pronunciation variants are obtainedfor each word. By means of the mother-tongue language recognizer, anumber of N various languages can then be recognized for themother-tongue language recognizer after creating a language-recognitionvocabulary using the N second phoneme sequences per word obtained in thepreceding step.

Whereas a look-up method in a lexicon configuration fails with mobileterminal devices because of the large memory requirement and formultilingual language recognition the set of languages was optimized,new Hidden Markov models have to be created and optimized for each newlanguage by means of grapheme/phoneme conversion into several languagesin accordance with the invention, a multilingual system is created thatcan be implemented with relatively simple means. In addition to thegrapheme-to-phoneme conversion, a mapping, i.e. a depiction between theindividual languages, is implemented. The phoneme sequence determinationand the succeeding mapping or depiction normally run offline on adevice, for example a mobile telephone, a personal digital assistant orpersonal computer with corresponding software, and are therefore timeuncritical. The resources required for this can be held in aninternal/external memory.

Because the language recognition vocabulary created by means of theaforementioned procedure includes an N pronunciation variant for eachword, the search effort during language recognition can be great. Toreduce this, a further step can be introduced under the exemplaryembodiment, that is performed before the creation of the languagerecognition vocabulary and after generation of the N second phonemesequences per word. In this step, the N second phoneme sequences areprocessed corresponding to the N second pronunciation variants of eachword, in that each second phoneme sequence is analyzed and classified bymeans of suitable distances, particularly the Levenshtein distance, andthe N second phoneme sequences of each word are reduced to a few,preferably two to three phoneme sequences, in that the pronunciationvariants that are least similar to the pronunciation variants of themother tongue are omitted. Simply expressed, the least importantpronunciation variants are omitted by this reduction, thus reducing thesearch effort during language recognition.

A further reduction in cost can be achieved in that a languageidentification and reduction is carried out before the first step. Aspart of this language identification, the probability for each word tobe recognized belonging to each of the N various languages isdetermined. Using the results of this language identification, thenumber of languages to be processed in the first step of the method isreduced, preferably to two or three different languages. The languageswith the least probability are not further processed. For a specificword, the result of the language identification can, for example, be asfollows: German 55%, UK English 16%, US English 14%, Swedish 3%, etc.Under this example, if only three languages are desired, the Swedishlanguage is omitted, i.e. not further processed.

The determination of the phonetic transcripts in the first step of themethod takes place preferably by means of at least one neural network.Neural networks have proved suitable for determining phonetictranscripts from written words, because they produce good results withregards to accuracy, and particularly with regard to the speed ofprocessing and can be easily implemented, particularly in software.

A Hidden Markov model, particularly one that has been created for thelanguage defined as a mother tongue, is suitable for use as a mothertongue language recognizer.

The exemplary embodiment of the invention relates to a languagerecognizer for speaker-independent language recognition of words fromvarious languages, particularly for recognizing names from variouslanguages. In this case, one of the various languages is defined as themother tongue. The language recognizer includes:

-   -   a mother tongue language recognizer,    -   a first processing model for determining the phonetic        transcripts of words, particularly for N various languages, in        order to obtain N first phoneme sequences corresponding to N        first pronunciation variants per word,    -   a second processing model for implementing a mapping of the        phoneme of each language on the particular phoneme set of the        mother tongue,    -   a third processing model for applying the mapping, implemented        by the second processing module, to N first phoneme sequences        for each word, determined with the first processing model,        whereby N second phoneme sequences corresponding to N second        pronunciation variants are obtained per word, that can be        recognized by the mother tongue language recognizer and    -   a fourth processing model for creating a language recognition        vocabulary with the N second phoneme sequences per word obtained        by the third processing module for the mother tongue language        recognizer.

Under a preferred embodiment, the automatic language recognizer has afifth processing module for processing the N second phoneme sequencescorresponding to the N second pronunciation variant of each word. Thefifth processing module is designed in such a way that each secondphoneme sequence is analyzed and classified using suitable distances,particularly the Levenshtein distance and the N second phoneme sequencesof each word are reduced to a few, preferably two to three, phonemesequences.

Furthermore, the automatic language recognizer can have a languageidentifier and a language reducer. The language identifier is connectedbefore the first processing module and, for each word to be recognized,it determines the probability of it belonging to each of the N differentlanguages. The language reducer reduces the number of languages to beprocessed by the first processing module, preferably down to two tothree different languages, so that the languages with the leastprobability are not further processed. The language identifier andlanguage reducer substantially reduce both the processing effort of theautomatic language recognizer, both in the input phase and in therecognition phase.

Preferably, the first processing module has at least one neural networkfor determining the phonetic transcripts.

Furthermore, the mother tongue language recognizer has, in a preferredform of embodiment, a Hidden Markov model that has been created for thelanguage defined as the mother tongue.

Turning to FIG. 1, a speaker-related name is selected on a mobiletelephone using the names from a telephone book, for a German-speakinguser. In the telephone book, there are in addition to the mainlyGerman-language names, also some foreign-language names. A transcriberfor the graphemic representation of the names is set for the German,Italian, Czech, Greek and Turkish languages, overall as N=5 differentlanguages.

In an initial step S0 of FIG. 1, a language identification of thesupplied words 10 or entries in the telephone book is undertaken. Moreprecisely, each individual word is analyzed with regard to theprobability of it belonging to one of the five languages. If, forexample, a German name is being processed, the probability for German isvery high. For the other four languages, i.e. Italian, Czech, Greek andTurkish, the probability is much lower. Using the probabilitiesdetermined per word, the language with the lowest probability is omittedduring subsequent processing. As an example, this means that in thesucceeding processing operation there are then only four, instead offive, languages that have to be processed.

In a first step S1 of FIG. 1, the phonetic transcript for each word isdetermined for each of the four different languages. In this way, fourphoneme sequences corresponding to the four first pronunciation variantsare obtained for each word.

In a second step S2 of FIG. 1, a mapping of the phonemes of each of thefour languages is implemented to the particular phoneme set of themother tongue.

In a third step S3 of FIG. 1, this mapping is applied to the four firstphoneme sequences 12 obtained in the first step S1. In this way, foursecond phoneme sequences 14 corresponding to the four secondpronunciation variants are obtained for each word. The four secondphoneme sequences 14 can already be recognized in a mother tonguelanguage recognizer.

Furthermore to further reduce the processing effort for the languagerecognizer, each second phoneme sequence is analyzed and classified foreach word using the Levenshtein distance (step S4). A fifth step S5 thentakes place, in which the analyzed and classified second phonemesequences per word are reduced to three phoneme sequences.

Finally, in a last step S6, a language recognition vocabulary is createdfor the mother tongue language recognizer with the three second phonemesequences per word obtained in the fifth step S5. By still furtherreducing the phoneme sequences in the fifth step of the method S5, thelanguage recognition vocabulary to be saved and to be analyzed during alanguage recognition process is substantially reduced. In a practicalapplication of the language recognizer, this has an advantage of havinga lower storage capacity requirement and also of a faster processing,because the vocabulary to be searched through is smaller.

After the described procedure has been completed, the user can, by meansof language recognition, make a name selection, i.e. make alanguage-controlled call up of stored telephone numbers using the nameof the subscriber, without having to explicitly pronounce the name ofthe subscriber to be called, i.e. without having to “train”.

Furthermore, if a user finds that a certain name is not well recognized,the user can call up the language recognition menu of his mobiletelephone and then select a “name selection” application. By means ofthis application, the user can now be offered one, or several ways ofimproving the language recognition of a certain word, or more preciselyof a certain name, from the electronic telephone book of the mobiletelephone. Some of these possibilities are briefly explained in thefollowing by way of example.

1. As an alternate embodiment, the user can again speak the poorlyrecognized or unrecognized word into the mobile telephone and then haveit converted into a phoneme sequence by means of the language recognizercontained in the mobile telephone. In this case, pronunciation variantspreviously automatically determined are either completely or partiallyremoved from the vocabulary of the language recognizer, depending ontheir closeness to the newly determined phoneme sequence.

2. As yet another alternate embodiment, the user can have a kind ofphonetic transcription of the poorly recognized or unrecognized entry inthe electronic telephone book shown on the display of the mobiletelephone. As an example, if there is a poor match to the user'spronunciation, the user can edit the kind of phonetic transcription. Forexample, by an automatic transcription of the entry “Jacques Chirac”,“Jakwes Shirak” can be stored as a phonetic transcription. If thisphonetic transcription now appears incorrect to the user, he can edit itusing his mobile telephone, for example to “Zhak Shirak”. The system canthen also determine the phonetic description and reenter this in thelanguage recognition vocabulary. This should enable the automaticlanguage recognition to function reliably.

3. Also, the user can, by an explicit specification of a language fromwhich a faulty or even unrecognized name originates substantiallyimprove the recognition by an explicit selection of a specific languagefor a specific name. In such a case, all the pronunciation variants ofthe name, that are not assigned to the explicitly specified language,are removed from the language recognition vocabulary.

In addition, although the invention is described in connection withmobile telephones, it should be readily apparent that the invention maybe practiced with any type of communicating device, such as a personalassistant or a PC. It is also understood that the device portions andsegments described in the embodiments above can substituted withequivalent devices to perform the disclosed methods.

1. A method for automated language recognition of words from differentlanguages said method embodied as computer program instructions encodedin tangible, non-transitory computer readable media associated with amobile device and comprising the steps of: (a) loading a phoneme setassociated with a language specified as a mother tongue into a mothertongue language recognizer; (b) for each of a plurality of words,determining phonetic transcripts for the word for N various languagesnot specified as the mother tongue to generate N first phoneme sequencesfor the word corresponding to N first pronunciation variants, each ofthe N first phoneme sequences formed from phonemes associated with oneof the N different languages; (c) determining a phoneme map by mappingthe generated first phoneme sequences of each of said N languages to arelevant phoneme set of the mother tongue; (d) for each of the pluralityof words, applying the phoneme map to each of the N first phonemesequences for that word in order to translate the N first phonemesequences into N second phoneme sequences, each of the N second phonemesequences formed from phonemes associated with the mother tonguelanguage, wherein each of the N first phoneme sequences of the N variouslanguage is translated into a corresponding second phoneme sequence ofthe mother tongue language (a) regardless of whether the mobile deviceincludes a speech model for each of the N various languages, and (b)regardless of whether the mother tongue language is the mostacoustically similar to each of the N various languages, with respect tothe respective first and second phoneme sequences, and such that foreach word, two different phonetic transcripts are generated for each ofthe N different languages, including (1) the N first phoneme sequencesfor the word, each formed from phonemes associated with one of the Ndifferent languages, and (2) the N second phoneme sequences for theword, each formed by applying the phoneme map to translate one of the Nfirst phoneme sequences formed from phonemes associated with one of theN different languages into a sequence of phonemes associated with themother tongue language; and (e) processing said N second phonemesequences with the phoneme set associated with the language specified asthe mother tongue to identify at least one of a matching word and asimilar word.
 2. The method according to claim 1, further comprising astep of adding the N second phoneme sequences for each word in alanguage recognition vocabulary located in the mother tongue languagerecognizer.
 3. The method according to claim 1, further determiningdistances to the N second pronunciation variants based at least on theprocessed N second phoneme sequences.
 4. The method according to claim3, further comprising a step of classifying each N second phonemesequences to identify respective distances.
 5. The method according toclaim 4, further comprising a step of eliminating any N second phonemesequences that do not exceed a predetermined threshold.
 6. The methodaccording to claim 5, wherein the distances are Leveshtein distances. 7.The method according to claim 1, further comprising the step ofdetermining probabilities that each word for N various languages notspecified as the mother tongue belong to a specified set of languages,said step of determining probabilities occurring before step (a).
 8. Themethod according to claim 7, further comprising the step of eliminatinglanguages from said specified set that do not exceed a predeterminedthreshold.
 9. The method according to claim 1, wherein the step ofdetermining the phonetic transcripts of each word for N variouslanguages not specified as the mother tongue is performed by at leastone neural network.
 10. The method according to claim 1, whereinprocessing said N second phoneme sequences with the phoneme setassociated with the language specified as a mother tongue is performedusing a Hidden Markov Model.
 11. An automatic language recognizingapparatus, including computer program modules encoded in tangible,non-transitory computer readable media associated with a mobile device,the computer program modules comprising: a mother tongue languagerecognizer, said recognizer storing a phoneme set of a predeterminedmother tongue; a first processing module for determining phonetictranscripts for each word of a plurality of words from N variouslanguages in order to obtain N first phoneme sequences for each wordcorresponding to N first pronunciation variants, each of the N firstphoneme sequences formed from phonemes associated with one of the Ndifferent languages; a second processing module for implementing amapping of first phoneme sequence of each of N various languages to aparticular phoneme set of the mother tongue; a third processing modulefor applying the implemented mapping of phonemes to translate the Nfirst phoneme sequences for each word determined by means of the firstprocessing module into N second phoneme sequences corresponding to Nsecond pronunciation variants being obtained for each word, the N secondphoneme sequences formed from phonemes associated with the mother tonguelanguage and being recognized by the mother tongue language recognizer;wherein the third processing module translates each of the N firstphoneme sequences of the N various language into a corresponding secondphoneme sequence of the mother tongue language (a) regardless of whetherthe mobile device includes a speech model for each of the N variouslanguages, and (b) regardless of whether the mother tongue language isthe most acoustically similar to each of the N various languages, withrespect to the respective first and second phoneme sequences, and suchthat for each word, two different phonetic transcripts are generated foreach of the N different languages, including (1) the N first phonemesequences for the word, each formed from phonemes associated with one ofthe N different languages, and (2) the N second phoneme sequences forthe word, each formed by applying the phoneme map to translate one ofthe N first phoneme sequences formed from phonemes associated with oneof the N different languages into a sequence of phonemes associated withthe mother tongue language; and a fourth processing module for creatinga language recognizable vocabulary with the N second phoneme sequencesfor each word, obtained by the third processing module, for the mothertongue language recognizer.
 12. The automatic language recognizingapparatus according to claim 11, further comprising a fifth processingmodule for processing the N second phoneme sequences corresponding tothe N second pronunciation variants of each word to obtain distances foreach N second phoneme sequence.
 13. The automatic language recognizingapparatus according to claim 12, wherein said distances are Levenshteindistances.
 14. The automatic language recognizing apparatus according toclaim 13, wherein the N second phoneme sequence distances not exceedinga predetermined threshold are eliminated from further processing. 15.The automatic language recognizing apparatus according to claim 11,further comprising a language identifier, coupled to the firstprocessing module, wherein the language identifier determines aprobability of each word belonging to each of the N various languages.16. The automatic language recognizing apparatus according to claim 15,further comprising a language reducer that reduces the number oflanguages from the first processing module to be processed if saidprobability does not exceed a predetermined thresholds.
 17. Theautomatic language recognizing apparatus according to claim 11, whereinthe first processing module comprises at least one neural network fordetermining the phonetic transcripts.
 18. The automatic languagerecognizing apparatus according to claim 11, wherein the mother tonguelanguage recognizer comprises a Hidden Markov model that has beencreated for the phoneme set of the predetermined mother tongue.